Framework for kernel regularization with application to protein clustering.
نویسندگان
چکیده
We develop and apply a previously undescribed framework that is designed to extract information in the form of a positive definite kernel matrix from possibly crude, noisy, incomplete, inconsistent dissimilarity information between pairs of objects, obtainable in a variety of contexts. Any positive definite kernel defines a consistent set of distances, and the fitted kernel provides a set of coordinates in Euclidean space that attempts to respect the information available while controlling for complexity of the kernel. The resulting set of coordinates is highly appropriate for visualization and as input to classification and clustering algorithms. The framework is formulated in terms of a class of optimization problems that can be solved efficiently by using modern convex cone programming software. The power of the method is illustrated in the context of protein clustering based on primary sequence data. An application to the globin family of proteins resulted in a readily visualizable 3D sequence space of globins, where several subfamilies and subgroupings consistent with the literature were easily identifiable.
منابع مشابه
A Framework for Kernel Regularization with Application to Protein Clustering
We develop and apply a novel framework which is designed to extract information in the form of a positive definite kernel matrix from possibly crude, noisy, incomplete, inconsistent dissimilarity information between pairs of objects, obtainable in a variety of contexts. Any positive definite kernel defines a consistent set of distances, and the fitted kernel provides a set of coordinates in Euc...
متن کاملMultiple Kernel Clustering Framework with Improved Kernels
Multiple kernel clustering (MKC) algorithms have been successfully applied into various applications. However, these successes are largely dependent on the quality of pre-defined base kernels, which cannot be guaranteed in practical applications. This may adversely affect the clustering performance. To address this issue, we propose a simple while effective framework to adaptively improve the q...
متن کاملیادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیکهای یادگیری معیار فاصله
Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...
متن کاملEncoding Dissimilarity Data for Statistical Model Building.
We summarize, review and comment upon three papers which discuss the use of discrete, noisy, incomplete, scattered pairwise dissimilarity data in statistical model building. Convex cone optimization codes are used to embed the objects into a Euclidean space which respects the dissimilarity information while controlling the dimension of the space. A "newbie" algorithm is provided for embedding n...
متن کاملKernel Cuts: MRF meets Kernel&Spectral Clustering
The log-likelihood energy term in popular model-fitting segmentation methods, e.g. [64, 14, 50, 20], is presented as a generalized “probabilistic” K-means energy [33] for color space clustering. This interpretation reveals some limitations, e.g. over-fitting. We propose an alternative approach to color clustering using kernel K-means energy with well-known properties such as non-linear separati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Proceedings of the National Academy of Sciences of the United States of America
دوره 102 35 شماره
صفحات -
تاریخ انتشار 2005